Upcoming Event: PhD Dissertation Defense
William Ruys, Ph.D. Candidate, Oden Institute
12 – 2PM
Monday Dec 2, 2024
POB 4.304
We present scalable, high-performance computational methods for data analysis on heterogeneous architectures, with applications ranging from geospatial intelligence to large-scale graph analysis and task scheduling. Our work introduces a suite of tools optimized for modern, multi-device systems, tackling core challenges of data processing, communication, and efficient scheduling across CPUs and GPUs.
In multimodal image registration, we propose extensions to two Fourier accelerated template matching methods tailored for registering diverse geospatial datasets.
We focus our evaluation on optical, hyperspectral (HSI), and LiDAR imagery, where resolution and radiometric differences pose significant challenges. On an aerial survey of Rome, a proposed l-whitened phase correlation of gradient images achieves at least 8% more matches than a traditional phase correlation approach. An optimized GPU implementation of this method is provided which leverages batched GPU routines and stages computation communication overlap.
For scalable graph construction, we present PyRKNN, a distributed Python library for all nearest neighbor graph construction on heterogeneous systems with support for both sparse and dense data. PyRKNN present optimizations to distributed randomized projection tree construction, such as iterated local search, delayed coordinate redistribution, and pipelined tree construction to achieve scalability across MPI ranks. We achieve at least 3x higher performance than similar projection tree codes developed by our lab.
In task scheduling, we present contributions to Parla, a Python-based orchestration system which allows flexible, device-specific scheduling and data movement for workloads across CPUs and GPUs, significantly enhancing Python’s capability for parallel processing.
We preset an optimized implementation of Parla, studying two-language runtime design in Python. We achieve a 57% speedup over Dask for 0.5 ms jobs on 16 threads in the presence of the Global Interpreter Lock (GIL). Unlike Dask, we do not observe a slow-down under modern proposals for a GIL-less Python interpreter.
As modern computing nodes get more complex, multi-device kernels and libraries that manage sets of devices internally—such as cuBLASmg and cuFFTmg are becoming more prevalent. We develop and extend support for programming and scheduling tasks that use arbitrary sets of devices with multiple implementation variants.This model introduces per-device task queuing algorithms and adaptive mapping for such tasks, achieving at least 8x lower overhead than similar multi-GPU tasks provided by Ray.
A discrete-event simulator of task scheduling is developed to model data caching, eviction, and routed communication between devices, which we use to study the task to device mapping problem.
Our work advances the field of heterogeneous computing by providing scalable tools that enable computational scientists and engineers to tackle complex data analysis tasks on multi-device platforms.
William is a PhD candidate working under the supervision of Dr. George Biros at the Oden Institute. His research focuses on high-performance computing, with particular emphasis on runtime systems, randomized numerical methods, and HPC kernel optimization. He received his BS in Computational Mathematics and Computer Science from the New Jersey Institute of Technology.